{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "# Sequence to sequence learning for performing number addition\n",
    "\n",
    "**Author:** [Smerity](https://twitter.com/Smerity) and others<br>\n",
    "**Date created:** 2015/08/17<br>\n",
    "**Last modified:** 2020/04/17<br>\n",
    "**Description:** A model that learns to add strings of numbers, e.g. \"535+61\" -> \"596\"."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "## Introduction\n",
    "\n",
    "In this example, we train a model to learn to add two numbers, provided as strings.\n",
    "\n",
    "**Example:**\n",
    "\n",
    "- Input: \"535+61\"\n",
    "- Output: \"596\"\n",
    "\n",
    "Input may optionally be reversed, which was shown to increase performance in many tasks\n",
    " in: [Learning to Execute](http://arxiv.org/abs/1410.4615) and\n",
    "[Sequence to Sequence Learning with Neural Networks](\n",
    "\n",
    " http://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf)\n",
    "\n",
    "Theoretically, sequence order inversion introduces shorter term dependencies between\n",
    " source and target for this problem.\n",
    "\n",
    "**Results:**\n",
    "\n",
    "For two digits (reversed):\n",
    "\n",
    "+ One layer LSTM (128 HN), 5k training examples = 99% train/test accuracy in 55 epochs\n",
    "\n",
    "Three digits (reversed):\n",
    "\n",
    "+ One layer LSTM (128 HN), 50k training examples = 99% train/test accuracy in 100 epochs\n",
    "\n",
    "Four digits (reversed):\n",
    "\n",
    "+ One layer LSTM (128 HN), 400k training examples = 99% train/test accuracy in 20 epochs\n",
    "\n",
    "Five digits (reversed):\n",
    "\n",
    "+ One layer LSTM (128 HN), 550k training examples = 99% train/test accuracy in 30 epochs\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "## Setup\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "colab_type": "code"
   },
   "outputs": [],
   "source": [
    "from tensorflow import keras\n",
    "from tensorflow.keras import layers\n",
    "import numpy as np\n",
    "\n",
    "# Parameters for the model and dataset.\n",
    "TRAINING_SIZE = 50000\n",
    "DIGITS = 3\n",
    "REVERSE = True\n",
    "\n",
    "# Maximum length of input is 'int + int' (e.g., '345+678'). Maximum length of\n",
    "# int is DIGITS.\n",
    "MAXLEN = DIGITS + 1 + DIGITS\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "## Generate the data\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "colab_type": "code"
   },
   "outputs": [],
   "source": [
    "\n",
    "class CharacterTable:\n",
    "    \"\"\"Given a set of characters:\n",
    "    + Encode them to a one-hot integer representation\n",
    "    + Decode the one-hot or integer representation to their character output\n",
    "    + Decode a vector of probabilities to their character output\n",
    "    \"\"\"\n",
    "\n",
    "    def __init__(self, chars):\n",
    "        \"\"\"Initialize character table.\n",
    "        # Arguments\n",
    "            chars: Characters that can appear in the input.\n",
    "        \"\"\"\n",
    "        self.chars = sorted(set(chars))\n",
    "        self.char_indices = dict((c, i) for i, c in enumerate(self.chars))\n",
    "        self.indices_char = dict((i, c) for i, c in enumerate(self.chars))\n",
    "\n",
    "    def encode(self, C, num_rows):\n",
    "        \"\"\"One-hot encode given string C.\n",
    "        # Arguments\n",
    "            C: string, to be encoded.\n",
    "            num_rows: Number of rows in the returned one-hot encoding. This is\n",
    "                used to keep the # of rows for each data the same.\n",
    "        \"\"\"\n",
    "        x = np.zeros((num_rows, len(self.chars)))\n",
    "        for i, c in enumerate(C):\n",
    "            x[i, self.char_indices[c]] = 1\n",
    "        return x\n",
    "\n",
    "    def decode(self, x, calc_argmax=True):\n",
    "        \"\"\"Decode the given vector or 2D array to their character output.\n",
    "        # Arguments\n",
    "            x: A vector or a 2D array of probabilities or one-hot representations;\n",
    "                or a vector of character indices (used with `calc_argmax=False`).\n",
    "            calc_argmax: Whether to find the character index with maximum\n",
    "                probability, defaults to `True`.\n",
    "        \"\"\"\n",
    "        if calc_argmax:\n",
    "            x = x.argmax(axis=-1)\n",
    "        return \"\".join(self.indices_char[x] for x in x)\n",
    "\n",
    "\n",
    "# All the numbers, plus sign and space for padding.\n",
    "chars = \"0123456789+ \"\n",
    "ctable = CharacterTable(chars)\n",
    "\n",
    "questions = []\n",
    "expected = []\n",
    "seen = set()\n",
    "print(\"Generating data...\")\n",
    "while len(questions) < TRAINING_SIZE:\n",
    "    f = lambda: int(\n",
    "        \"\".join(\n",
    "            np.random.choice(list(\"0123456789\"))\n",
    "            for i in range(np.random.randint(1, DIGITS + 1))\n",
    "        )\n",
    "    )\n",
    "    a, b = f(), f()\n",
    "    # Skip any addition questions we've already seen\n",
    "    # Also skip any such that x+Y == Y+x (hence the sorting).\n",
    "    key = tuple(sorted((a, b)))\n",
    "    if key in seen:\n",
    "        continue\n",
    "    seen.add(key)\n",
    "    # Pad the data with spaces such that it is always MAXLEN.\n",
    "    q = \"{}+{}\".format(a, b)\n",
    "    query = q + \" \" * (MAXLEN - len(q))\n",
    "    ans = str(a + b)\n",
    "    # Answers can be of maximum size DIGITS + 1.\n",
    "    ans += \" \" * (DIGITS + 1 - len(ans))\n",
    "    if REVERSE:\n",
    "        # Reverse the query, e.g., '12+345  ' becomes '  543+21'. (Note the\n",
    "        # space used for padding.)\n",
    "        query = query[::-1]\n",
    "    questions.append(query)\n",
    "    expected.append(ans)\n",
    "print(\"Total questions:\", len(questions))\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "## Vectorize the data\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "colab_type": "code"
   },
   "outputs": [],
   "source": [
    "print(\"Vectorization...\")\n",
    "x = np.zeros((len(questions), MAXLEN, len(chars)), dtype=np.bool)\n",
    "y = np.zeros((len(questions), DIGITS + 1, len(chars)), dtype=np.bool)\n",
    "for i, sentence in enumerate(questions):\n",
    "    x[i] = ctable.encode(sentence, MAXLEN)\n",
    "for i, sentence in enumerate(expected):\n",
    "    y[i] = ctable.encode(sentence, DIGITS + 1)\n",
    "\n",
    "# Shuffle (x, y) in unison as the later parts of x will almost all be larger\n",
    "# digits.\n",
    "indices = np.arange(len(y))\n",
    "np.random.shuffle(indices)\n",
    "x = x[indices]\n",
    "y = y[indices]\n",
    "\n",
    "# Explicitly set apart 10% for validation data that we never train over.\n",
    "split_at = len(x) - len(x) // 10\n",
    "(x_train, x_val) = x[:split_at], x[split_at:]\n",
    "(y_train, y_val) = y[:split_at], y[split_at:]\n",
    "\n",
    "print(\"Training Data:\")\n",
    "print(x_train.shape)\n",
    "print(y_train.shape)\n",
    "\n",
    "print(\"Validation Data:\")\n",
    "print(x_val.shape)\n",
    "print(y_val.shape)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "## Build the model\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "colab_type": "code"
   },
   "outputs": [],
   "source": [
    "print(\"Build model...\")\n",
    "num_layers = 1  # Try to add more LSTM layers!\n",
    "\n",
    "model = keras.Sequential()\n",
    "# \"Encode\" the input sequence using a LSTM, producing an output of size 128.\n",
    "# Note: In a situation where your input sequences have a variable length,\n",
    "# use input_shape=(None, num_feature).\n",
    "model.add(layers.LSTM(128, input_shape=(MAXLEN, len(chars))))\n",
    "# As the decoder RNN's input, repeatedly provide with the last output of\n",
    "# RNN for each time step. Repeat 'DIGITS + 1' times as that's the maximum\n",
    "# length of output, e.g., when DIGITS=3, max output is 999+999=1998.\n",
    "model.add(layers.RepeatVector(DIGITS + 1))\n",
    "# The decoder RNN could be multiple layers stacked or a single layer.\n",
    "for _ in range(num_layers):\n",
    "    # By setting return_sequences to True, return not only the last output but\n",
    "    # all the outputs so far in the form of (num_samples, timesteps,\n",
    "    # output_dim). This is necessary as TimeDistributed in the below expects\n",
    "    # the first dimension to be the timesteps.\n",
    "    model.add(layers.LSTM(128, return_sequences=True))\n",
    "\n",
    "# Apply a dense layer to the every temporal slice of an input. For each of step\n",
    "# of the output sequence, decide which character should be chosen.\n",
    "model.add(layers.Dense(len(chars), activation=\"softmax\"))\n",
    "model.compile(loss=\"categorical_crossentropy\", optimizer=\"adam\", metrics=[\"accuracy\"])\n",
    "model.summary()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "## Train the model\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "colab_type": "code"
   },
   "outputs": [],
   "source": [
    "epochs = 30\n",
    "batch_size = 32\n",
    "\n",
    "\n",
    "# Train the model each generation and show predictions against the validation\n",
    "# dataset.\n",
    "for epoch in range(1, epochs):\n",
    "    print()\n",
    "    print(\"Iteration\", epoch)\n",
    "    model.fit(\n",
    "        x_train,\n",
    "        y_train,\n",
    "        batch_size=batch_size,\n",
    "        epochs=1,\n",
    "        validation_data=(x_val, y_val),\n",
    "    )\n",
    "    # Select 10 samples from the validation set at random so we can visualize\n",
    "    # errors.\n",
    "    for i in range(10):\n",
    "        ind = np.random.randint(0, len(x_val))\n",
    "        rowx, rowy = x_val[np.array([ind])], y_val[np.array([ind])]\n",
    "        preds = np.argmax(model.predict(rowx), axis=-1)\n",
    "        q = ctable.decode(rowx[0])\n",
    "        correct = ctable.decode(rowy[0])\n",
    "        guess = ctable.decode(preds[0], calc_argmax=False)\n",
    "        print(\"Q\", q[::-1] if REVERSE else q, end=\" \")\n",
    "        print(\"T\", correct, end=\" \")\n",
    "        if correct == guess:\n",
    "            print(\"\u2611 \" + guess)\n",
    "        else:\n",
    "            print(\"\u2612 \" + guess)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "You'll get to 99+% validation accuracy after ~30 epochs.\n"
   ]
  }
 ],
 "metadata": {
  "colab": {
   "collapsed_sections": [],
   "name": "addition_rnn",
   "private_outputs": false,
   "provenance": [],
   "toc_visible": true
  },
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}